28 research outputs found

    Data analysis in chemistry and bio-medical sciences

    Get PDF
    EditorialMinisterio de Economía y Competitividad; CTQ2013-41229-PMinisterio de Economía y Competitividad; CTQ2013-41229-P/BQUMinisterio de Economía y Competitividad; CTQ2016-74881-PPaís Vasco. Gobierno; IT1045-1

    Chiral Bronsted Acid-Catalyzed Enantioselective alpha-Amidoalkylation Reactions: A Joint Experimental and Predictive Study,

    Get PDF
    Enamides with a free NH group have been evaluated as nucleophiles in chiral Bronsted acid-catalyzed enantioselective alpha-amidoalkylation reactions of bicyclic hydroxylactams for the generation of quaternary stereocenters. A quantitative structure-reactivity relationship (QSRR) method has been developed to find a useful tool to rationalize the enantioselectivity in this and related processes and to orient the catalyst choice. This correlative perturbation theory (PT)-QSRR approach has been used to predict the effect of the structure of the substrate, nucleophile, and catalyst, as well as the experimental conditions, on the enantioselectivity. In this way, trends to improve the experimental results could be found without engaging in a long-term empirical investigation.Ministerio de Economia y Competitividad (CTQ2013-41229-P), IKERBASQUE foundation, Gobierno Vasco (IT-623-13) and Universidad del Pais Vasco/Euskal Herriko Unibertsitatea UPV/EHU are gratefully acknowledged for their financial support. Technical and human support provided by Servicios Generales de Investigacion SGIker (UPV/EHU, MINECO, GV/EJ, ERDF and ESF) is also acknowledged

    MIANN models in medicinal, physical and organic chemistry

    Get PDF
    [Abstract] Reducing costs in terms of time, animal sacrifice, and material resources with computational methods has become a promising goal in Medicinal, Biological, Physical and Organic Chemistry. There are many computational techniques that can be used in this sense. In any case, almost all these methods focus on few fundamental aspects including: type (1) methods to quantify the molecular structure, type (2) methods to link the structure with the biological activity, and others. In particular, MARCH-INSIDE (MI), acronym for Markov Chain Invariants for Networks Simulation and Design, is a well-known method for QSAR analysis useful in step (1). In addition, the bio-inspired Artificial-Intelligence (AI) algorithms called Artificial Neural Networks (ANNs) are among the most powerful type (2) methods. We can combine MI with ANNs in order to seek QSAR models, a strategy which is called herein MIANN (MI & ANN models). One of the first applications of the MIANN strategy was in the development of new QSAR models for drug discovery. MIANN strategy has been expanded to the QSAR study of proteins, protein-drug interactions, and protein-protein interaction networks. In this paper, we review for the first time many interesting aspects of the MIANN strategy including theoretical basis, implementation in web servers, and examples of applications in Medicinal and Biological chemistry. We also report new applications of the MIANN strategy in Medicinal chemistry and the first examples in Physical and Organic Chemistry, as well. In so doing, we developed new MIANN models for several self-assembly physicochemical properties of surfactants and large reaction networks in organic synthesis. In some of the new examples we also present experimental results which were not published up to date.Ministerio de Ciencia e Innovación; CTQ2009-07733Universidad del Pais Vasco; UFI11/22Universidad del Pais Vasco; GIU 094

    Perturbation-Theory Machine Learning (PTML) Multilabel Model of the ChEMBL Dataset of Preclinical Assays for Antisarcoma Compounds

    Get PDF
    [Abstract] Sarcomas are a group of malignant neoplasms of connective tissue with a different etiology than carcinomas. The efforts to discover new drugs with antisarcoma activity have generated large datasets of multiple preclinical assays with different experimental conditions. For instance, the ChEMBL database contains outcomes of 37,919 different antisarcoma assays with 34,955 different chemical compounds. Furthermore, the experimental conditions reported in this dataset include 157 types of biological activity parameters, 36 drug targets, 43 cell lines, and 17 assay organisms. Considering this information, we propose combining perturbation theory (PT) principles with machine learning (ML) to develop a PTML model to predict antisarcoma compounds. PTML models use one function of reference that measures the probability of a drug being active under certain conditions (protein, cell line, organism, etc.). In this paper, we used a linear discriminant analysis and neural network to train and compare PT and non-PT models. All the explored models have an accuracy of 89.19–95.25% for training and 89.22–95.46% in validation sets. PTML-based strategies have similar accuracy but generate simplest models. Therefore, they may become a versatile tool for predicting antisarcoma compounds.Ministerio de Economía y Competitividad; CTQ2016-74881-PMinisterio de Economía y Competitividad; UNLC08-1E-002Ministerio de Economía y Competitividad; UNLC13-13-3503Xunta de Galicia; ED431C 2018/49Xunta de Galicia; ED431D 2017/16Xunta de Galicia; ED431G/01Xunta de Galicia; ED431D 2017/23Gobierno Vasco; IT1045-16Instituto de Salud Carlos III; PI17/0182

    Perturbation-Theory and Machine Learning (PTML). Model for High-Throughput Screening of Parham Reactions: Experimental and Theoretical Studies

    Get PDF
    Machine Learning (ML) algorithms are gaining importance in the processing of chemical information and modelling of chemical reactivity problems. In this work, we have developed a PTML model combining Perturbation-Theory (PT) and ML algorithms for predicting the yield of a given reaction. For this purpose, we have selected Parham cyclization, which is a general and powerful tool for the synthesis of heterocyclic and carbocyclic compounds. This reaction has both structural (substitution pattern on the substrate, internal electrophile, ring size, etc.) and operational variables (organolithium reagent, solvent, temperature, time, etc.), so predicting the effect of changes on substrate design (internal elelctrophile, halide, etc.) or reaction conditions on the yield is an important task that could help to optimize the reaction design. The PTML model developed uses PT operators to account for perturbations in experimental conditions and/or structural variables of all the molecules involved in a query reaction compared to a reaction of reference. Thus, a dataset of >100 reactions has been collected for different substrates and internal electrophiles, under different reaction conditions, with a wide range of yields (0 – 98%). The best PTML model found using General Linear Regression (GLR) has R = 0.88 in training and R = 0.83 in external validation series for 10000 pairs of query and reference reactions. The PTML model has a final R = 0.95 for all reactions using multiple reactions of reference. We also report a comparative study of linear vs. non-linear PTML models based on Artificial Neural Networks (ANN) algorithms. PTML-ANN models (LNN, MLP, RBF) with R ≈ 0.1 - 0.8 do not outperform the first PMTL model. This result confirms the validity of the linearity of the model. Next, we carried out an experimental and theoretical study of non-reported Parham reactions to illustrate the practical use of the PTML model. A 500000-point simulation and a Hammett analysis of the reactivity space of Parham reactions are also reportedMinisterio de Economía y Competitividad (CTQ2016-74881-P) / Ministerio de Economía y Competitividad (CTQ2013-41229-P) / Gobierno Vasco (IT1045-16

    Prediction of Antimalarial Drug-Decorated Nanoparticle Delivery Systems with Random Forest Models

    Get PDF
    Drug-decorated nanoparticles (DDNPs) have important medical applications. The current work combined Perturbation Theory with Machine Learning and Information Fusion (PTMLIF). Thus, PTMLIF models were proposed to predict the probability of nanoparticle–compound/drug complexes having antimalarial activity (against Plasmodium). The aim is to save experimental resources and time by using a virtual screening for DDNPs. The raw data was obtained by the fusion of experimental data for nanoparticles with compound chemical assays from the ChEMBL database. The inputs for the eight Machine Learning classifiers were transformed features of drugs/compounds and nanoparticles as perturbations of molecular descriptors in specific experimental conditions (experiment-centered features). The resulting dataset contains 107 input features and 249,992 examples. The best classification model was provided by Random Forest, with 27 selected features of drugs/compounds and nanoparticles in all experimental conditions considered. The high performance of the model was demonstrated by the mean Area Under the Receiver Operating Characteristics (AUC) in a test subset with a value of 0.9921 ± 0.000244 (10-fold cross-validation). The results demonstrated the power of information fusion of the experimental-centered features of drugs/compounds and nanoparticles for the prediction of nanoparticle–compound antimalarial activity. The scripts and dataset for this project are available in the open GitHub repository.This research and the APC were funded by Consolidation and Structuring of Competitive Research Units—Competitive Reference Groups (ED431C 2018/49) funded by the Ministry of Education, University and Vocational Training of Xunta de Galicia endowed with EU FEDER funds

    MATEO: intermolecular α-amidoalkylation theoretical enantioselectivity optimization. Online tool for selection and design of chiral catalysts and products

    Get PDF
    The enantioselective Brønsted acid-catalyzed α-amidoalkylation reaction is a useful procedure is for the production of new drugs and natural products. In this context, Chiral Phosphoric Acid (CPA) catalysts are versatile catalysts for this type of reactions. The selection and design of new CPA catalysts for diferent enantioselective reactions has a dual interest because new CPA catalysts (tools) and chiral drugs or materials (products) can be obtained. However, this process is difcult and time consuming if approached from an experimental trial and error perspective. In this work, an Heuristic Perturbation-Theory and Machine Learning (HPTML) algorithm was used to seek a predictive model for CPA catalysts performance in terms of enantioselectivity in α-amidoalkylation reactions with R2=0.96 overall for training and validation series. It involved a Monte Carlo sampling of>100,000 pairs of query and reference reac‑ tions. In addition, the computational and experimental investigation of a new set of intermolecular α-amidoalkylation reactions using BINOL-derived N-trifylphosphoramides as CPA catalysts is reported as a case of study. The model was implemented in a web server called MATEO: InterMolecular Amidoalkylation Theoretical Enantioselectivity Optimization, available online at: https://cptmltool.rnasa-imedir.com/CPTMLTools-Web/mateo. This new user-friendly online computational tool would enable sustainable optimization of reaction conditions that could lead to the design of new CPA catalysts along with new organic synthesis products.Ministerio de Ciencia e Innovación ( PID2019104148 GB-I00; PID2022-137365NB-I00), Gobierno Vasco IT1558-2

    Prediction of Antileishmanial Compounds: General Model, Preparation, and Evaluation of 2‑Acylpyrrole Derivatives

    Get PDF
    In this work, the SOFT.PTML tool has been used to pre-process a ChEMBL dataset of pre-clinical assays of antileishmanial compound candidates. A comparative study of different ML algorithms, such as logistic regression (LOGR), support vector machine (SVM), and random forests (RF), has shown that the IFPTML-LOGR model presents excellent values of specificity and sensitivity (81−98%) in training and validation series. The use of this software has been illustrated with a practical case study focused on a series of 28 derivatives of 2-acylpyrroles 5a,b, obtained through a Pd(II)-catalyzed C−H radical acylation of pyrroles. Their in vitro leishmanicidal activity against visceral (L. donovani) and cutaneous (L. amazonensis) leishmaniasis was evaluated finding that compounds 5bc (IC50 = 30.87 μM, SI > 10.17) and 5bd (IC50 = 16.87 μM, SI > 10.67) were approximately 6-fold more selective than the drug of reference (miltefosine) in in vitro assays against L. amazonensis promastigotes. In addition, most of the compounds showed low cytotoxicity, CC50 > 100 μg/ mL in J774 cells. Interestingly, the IFPMTL-LOGR model predicts correctly the relative biological activity of these series of acylpyrroles. A computational high-throughput screening (cHTS) study of 2-acylpyrroles 5a,b has been performed calculating >20,700 activity scores vs a large space of 647 assays involving multiple Leishmania species, cell lines, and potential target proteins. Overall, the study demonstrates that the SOFT.PTML all-in-one strategy is useful to obtain IFPTML models in a friendly interface making the work easier and faster than before. The present work also points to 2-acylpyrroles as new lead compounds worthy of further optimization as antileishmanial hits.Ministerio de Ciencia e Innovación (PID2019-104148GB-I00), Gobierno Vasco (IT1558-22

    Modeling Antibacterial Activity with Machine Learning and Fusion of Chemical Structure Information with Microorganism Metabolic Networks

    Get PDF
    Predicting the activity of new chemical compounds over pathogenic microorganisms with different metabolic reaction networks (MRNs) is an important goal due to the different susceptibility to antibiotics. The ChEMBL database contains >160 000 outcomes of preclinical assays of antimicrobial activity for 55 931 compounds with >365 parameters of activity (MIC, IC50, etc.) and >90 bacteria strains of >25 bacterial species. In addition, the Leong and Barabàsi data set includes >40 MRNs of microorganisms. However, there are no models able to predict antibacterial activity for multiple assays considering both drug and MRN structures at the same time. In this work, we combined perturbation theory, machine learning, and information fusion techniques to develop the first PTMLIF model. The best linear model found presented values of specificity = 90.31/90.40 and sensitivity = 88.14/88.07 in training/validation series. We carried out a comparison to nonlinear artificial neural network (ANN) techniques and previous models from the literature. Next, we illustrated the practical use of the model with an experimental case of study. We reported for the first time the isolation and characterization of terpenes from the plant Cissus incisa. The antibacterial activity of the terpenes was experimentally determined. The more active compounds were phytol and α-amyrin, with MIC = 100 μg/mL for Vancomycin-resistant Enterococcus faecium and Acinetobacter baumannii resistant to carbapenems. These compounds are already known from other sources. However, they have been isolated and evaluated for the first time here against several strains of multidrug-resistant bacteria including World Health Organization (WHO) priority pathogens. Last, we used the model to predict the activity of these compounds versus other microorganisms with different MRNs in order to find other potential targets.Ministerio de Economía y Competitividad (CTQ2016-74881-P) // Gobierno Vasco (IT1045-16

    Prediction of Anti-Glioblastoma Drug-Decorated Nanoparticle Delivery Systems Using Molecular Descriptors and Machine Learning

    Get PDF
    The theoretical prediction of drug-decorated nanoparticles (DDNPs) has become a very important task in medical applications. For the current paper, Perturbation Theory Machine Learning (PTML) models were built to predict the probability of different pairs of drugs and nanoparticles creating DDNP complexes with anti-glioblastoma activity. PTML models use the perturbations of molecular descriptors of drugs and nanoparticles as inputs in experimental conditions. The raw dataset was obtained by mixing the nanoparticle experimental data with drug assays from the ChEMBL database. Ten types of machine learning methods have been tested. Only 41 features have been selected for 855,129 drug-nanoparticle complexes. The best model was obtained with the Bagging classifier, an ensemble meta-estimator based on 20 decision trees, with an area under the receiver operating characteristic curve (AUROC) of 0.96, and an accuracy of 87% (test subset). This model could be useful for the virtual screening of nanoparticle-drug complexes in glioblastoma. All the calculations can be reproduced with the datasets and python scripts, which are freely available as a GitHub repository from authors. View Full-TextThe APC was funded by IKERDATA, S.L. under grant 3/12/DP/2021/00102—Area 1: Development of innovative business projects, from Provincial Council of Vizcaya (BEAZ for the Creation of Innovative Business Innovative business ventures)
    corecore